BioKB - Text mining and semantic technologies for the biomedical content discovery
نویسندگان
چکیده
The ever-increasing number of publicly available biomedical articles calls for automatic information extraction from digitized publications. We have implemented a pipeline which, by exploiting text mining and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles from PubMed and Elsevier. The text mining component analyzes the articles content and extracts relations between a wide variety of concepts, extending the scope from proteins, chemicals and pathologies to biological processes and molecular functions. Moreover, the relations are extracted along with the context which specifies localization of the detected events, preconditions, temporal and logic order, mutual dependency and/or exclusion. Extracted knowledge is stored in a knowledge base publicly available for both, human and machine access, via web interface and SPARQL endpoint. To address the data accessibility, reusability and interoperability, all the extracted relations are standardized using unique resource identifiers (URIs) and a custom ontology based on Genia ontology.
منابع مشابه
Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملKnowledge Management for Biomedical Literature: the Function of Text-mining Technologies in Life-science Research
Efficient information retrieval and extraction is a major challenge in life-science research. The Knowledge Management (KM) for biomedical literature aims to establish an environment, utilizing information technologies, to facilitate better acquisition, generation, codification, and transfer of knowledge. Knowledge Discovery in Text (KDT) is one of the goals in KM, so as to find hidden informat...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملBiomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges
Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However...
متن کاملRelational and Semantic Data Mining for Biomedical Research
The paper presents a historical overview of data mining tools and applications in the field of biomedical research, developed at the Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia. It first outlines subgroup discovery and selected relational data mining approaches, with the emphasis on propositionalization and relational subgroup discovery, which prove to be e...
متن کامل